AITopics | step size parameter

Collaborating Authors

step size parameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplement: Proximity Operator of the Matrix Perspective Function and its Applications Joong-Ho Won Department of Statistics Seoul National University wonj@stats.snu.ac.kr A Proofs A.1 A key lemma

Neural Information Processing SystemsOct-2-2025, 19:37:57 GMT

Proofs of both Theorems 2 and 4 are based on the following key lemma, Lemma A.1. To prove this lemma, we begin by recalling the definition of directional derivatives. F (x + t h) F (x) t if the limit exists. Now we can prove the lemma: Proof of Lemma A.1. The following lemma shows a representation of an element of this set in terms of M: Lemma A.3.

artificial intelligence, lemma, machine learning, (11 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Dynamical softassign and adaptive parameter tuning for graph matching

Shen, Binrui, Niu, Qiang, Zhu, Shengxin

arXiv.org Artificial IntelligenceSep-25-2022

This paper studies a framework, projected fixed-point method, for graph matching. The framework contains a class of popular graph matching algorithms, including graduated assignment (GA), integer projected fixed-point method (IPFP) and doubly stochastic projected fixed-point method (DSPFP). We propose an adaptive strategy to tune the step size parameter in this framework. Such a strategy improves these algorithms in efficiency and accuracy. Further, it guarantees the convergence of the underlying algorithms. Some preliminary analysis based on distance geometry seems to support that the optimal step size parameter has a high probability of 1 when graphs are fully connected. Secondly, it is observed that a popular projection method, softassign, is sensitive to graphs' cardinality(size). We proposed a dynamical softassign algorithm that is robust to graphs' cardinality. Combining the adaptive step size and the dynamical softassign, we propose a novel graph matching algorithm: the adaptive projected fixed-point method with dynamical softassign. Various experiments demonstrate that the proposed algorithm is significantly faster than several other state-of-art algorithms with no loss of accuracy.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2208.08233

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
(3 more...)

Genre:

Research Report (0.70)
Overview (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.46)

Add feedback

Theoretical Interpretation of Learned Step Size in Deep-Unfolded Gradient Descent

Takabe, Satoshi, Wadayama, Tadashi

arXiv.org Machine LearningJan-30-2020

Theoretical Interpretation of Learned Step Size in Deep-Unfolded Gradient Descent Satoshi Takabe † and Tadashi Wadayama Nagoya Institute of Technology, Gokiso, Nagoya, Aichi, 466-8555, Japan, {wadayama, s_takabe}@nitech.ac.jp † RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo, 103-0027, Japan Abstract --Deep unfolding is a promising deep-learning technique in which an iterative algorithm is unrolled to a deep network architecture with trainable parameters. In the case of gradient descent algorithms, as a result of the training process, one often observes the acceleration of the convergence speed with learned non-constant step size parameters whose behavior is not intuitive nor interpretable from conventional theory. In this paper, we provide a theoretical interpretation of the learned step size of deep-unfolded gradient descent (DUGD). We first prove that the training process of DUGD reduces not only the mean squared error loss but also the spectral radius related to the convergence rate. Next, we show that minimizing the upper bound of the spectral radius naturally leads to the Chebyshev step which is a sequence of the step size based on Chebyshev polynomials. The numerical experiments confirm that the Chebyshev steps qualitatively reproduce the learned step size parameters in DUGD, which provides a plausible interpretation of the learned parameters. Additionally, we show that the Chebyshev steps achieve the lower bound of the convergence rate for the first-order method in a specific limit without learning parameters or momentum terms. I NTRODUCTION Deep unfolding [10], [12] is a promising deep learning approach whose architecture is based on existing iterative algorithms with tuning parameters such as step sizes in gradient descent (GD). The recursive structure of the algorithm is unrolled to a deep network and some parameters are embedded into the network. These parameters can be trained using standard deep learning techniques such as back propagation and stochastic GD if all the processes in the algorithm are differentiable. One notable advantage of deep unfolding is the acceleration of the convergence speed that results from tuning parameters compared with the original algorithm. Embedding proper trainable parameters also offers a flexible network structure to the algorithm that is applicable, for example, to inverse problems with/without prior information [26]. Recently, theoretical aspects of deep unfolding have also been investigated [5], [21], [23]. MSE performance (upper) and learned step size parameters {γ t} 24 t 0 (lower) of DUGD (circles) and GD with a constant step size (cross marks) when (n,m) (300, 600).

algorithm, chebyshev step, step size parameter, (16 more...)

arXiv.org Machine Learning

2001.05142

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.24)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learned Step Size Quantization

Esser, Steven K., McKinstry, Jeffrey L., Bablani, Deepika, Appuswamy, Rathinakumar, Modha, Dharmendra S.

arXiv.org Machine LearningFeb-21-2019

We present here Learned Step Size Quantization, a method for training deep networks such that they can run at inference time using low precision integer matrix multipliers, which offer power and space advantages over high precision alternatives. The essence of our approach is to learn the step size parameter of a uniform quantizer by backpropagation of the training loss, applying a scaling factor to its learning rate, and computing its associated loss gradient by ignoring the discontinuity present in the quantizer. This quantization approach can be applied to activations or weights, using different levels of precision as needed for a given system, and requiring only a simple modification of existing training code. As demonstrated on the ImageNet dataset, our approach achieves better accuracy than all previous published methods for creating quantized networks on several ResNet network architectures at 2-, 3- and 4-bits of precision.

gradient, quantization, step size, (14 more...)

arXiv.org Machine Learning

1902.08153

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback